43 research outputs found

    Spatio-Temporal Multiway Data Decomposition Using Principal Tensor Analysis on k-Modes: The R Package PTAk

    Get PDF
    The purpose of this paper is to describe the R package {PTAk and how the spatio-temporal context can be taken into account in the analyses. Essentially PTAk() is a multiway multidimensional method to decompose a multi-entries data-array, seen mathematically as a tensor of any order. This PTAk-modes method proposes a way of generalizing SVD (singular value decomposition), as well as some other well known methods included in the R package, such as PARAFAC or CANDECOMP and the PCAn-modes or Tucker-n model. The example datasets cover different domains with various spatio-temporal characteristics and issues: (i)~medical imaging in neuropsychology with a functional MRI (magnetic resonance imaging) study, (ii)~pharmaceutical research with a pharmacodynamic study with EEG (electro-encephaloegraphic) data for a central nervous system (CNS) drug, and (iii)~geographical information system (GIS) with a climatic dataset that characterizes arid and semi-arid variations. All the methods implemented in the R package PTAk also support non-identity metrics, as well as penalizations during the optimization process. As a result of these flexibilities, together with pre-processing facilities, PTAk constitutes a framework for devising extensions of multidimensional methods such ascorrespondence analysis, discriminant analysis, and multidimensional scaling, also enabling spatio-temporal constraints.

    Spatio-Temporal Multiway Data Decomposition Using Principal Tensor Analysis on k-Modes: The R Package PTAk

    Get PDF
    The purpose of this paper is to describe the <b>R</b> package {<b>PTAk</b> and how the spatio-temporal context can be taken into account in the analyses. Essentially PTAk() is a multiway multidimensional method to decompose a multi-entries data-array, seen mathematically as a tensor of any order. This PTAk-modes method proposes a way of generalizing SVD (singular value decomposition), as well as some other well known methods included in the <b>R</b> package, such as PARAFAC or CANDECOMP and the PCAn-modes or Tucker-n model. The example datasets cover different domains with various spatio-temporal characteristics and issues: (i)~medical imaging in neuropsychology with a functional MRI (magnetic resonance imaging) study, (ii)~pharmaceutical research with a pharmacodynamic study with EEG (electro-encephaloegraphic) data for a central nervous system (CNS) drug, and (iii)~geographical information system (GIS) with a climatic dataset that characterizes arid and semi-arid variations. All the methods implemented in the <b>R</b> package <b>PTAk</b> also support non-identity metrics, as well as penalizations during the optimization process. As a result of these flexibilities, together with pre-processing facilities, <b>PTAk</b> constitutes a framework for devising extensions of multidimensional methods such ascorrespondence analysis, discriminant analysis, and multidimensional scaling, also enabling spatio-temporal constraints

    A singular value decomposition of a k-way array for a principal component analysis of multiway data, PTA-k

    Get PDF
    AbstractEmploying a tensorial approach to describe a k-way array, the singular value decomposition of this type of multiarray is established. The algorithm given to attain a singular value, based on a generalization of the transition formulae, has a Gauss-Seidel form. A recursive algorithm leads to the decomposition termed SVD-k. A generalization of the Eckart-Young theorem is introduced by consideration of new rank concepts: the orthogonal rank and the free orthogonal rank. The application of this generalization in data analysis is illustrated by a principal component analysis (PCA) over k modes, termed PTA-k, which conserves most of the properties of a PCA

    Full Metadata Object profiling for flexible geoprocessing workflows

    Get PDF
    The design and running of complex geoprocessing workflows is an increasingly common geospatial modelling and analysis task. The Business Process Model and Notation (BPMN) standard, which provides a graphical representation of a workflow, allows stakeholders to discuss the scientific conceptual approach behind this modelling while also defining a machine-readable encoding in XML. Previous research has enabled the orchestration of Open Geospatial Consortium (OGC) Web Processing Services (WPS) with a BPMN workflow engine. However, the need for direct access to pre-defined data inputs and outputs results in a lack of flexibility during composition of the workflow and of efficiency during execution. This article develops metadata profiling approaches, described as two possible configurations, which enable workflow management at the meta-level through a coupling with a metadata catalogue. Specifically, a WPS profile and a BPMN profile are developed and tested using open-source components to achieve this coupling. A case study in the context of an event mapping task applied within a big data framework and based on analysis of the Global Database of Event Language and Tone (GDELT) database illustrates the two different architectures

    A flexible framework for assessing the quality of crowdsourced data

    Get PDF
    Ponencias, comunicaciones y pósters presentados en el 17th AGILE Conference on Geographic Information Science "Connecting a Digital Europe through Location and Place", celebrado en la Universitat Jaume I del 3 al 6 de junio de 2014.Crowdsourcing as a means of data collection has produced previously unavailable data assets and enriched existing ones, but its quality can be highly variable. This presents several challenges to potential end users that are concerned with the validation and quality assurance of the data collected. Being able to quantify the uncertainty, define and measure the different quality elements associated with crowdsourced data, and introduce means for dynamically assessing and improving it is the focus of this paper. We argue that the required quality assurance and quality control is dependent on the studied domain, the style of crowdsourcing and the goals of the study. We describe a framework for qualifying geolocated data collected from non-authoritative sources that enables assessment for specific case studies by creating a workflow supported by an ontological description of a range of choices. The top levels of this ontology describe seven pillars of quality checks and assessments that present a range of techniques to qualify, improve or reject data. Our generic operational framework allows for extension of this ontology to specific applied domains. This will facilitate quality assurance in real-time or for post-processing to validate data and produce quality metadata. It enables a system that dynamically optimises the usability value of the data captured. A case study illustrates this framework

    Simple, multiple and multiway correspondence analysis applied to spatial census-based population microsimulation studies using R

    Get PDF
    As a bivariate and multivariate multidimensional exploratory method, simple and multiple correspondence analyses have been used successfully in social science for survey or questionnaire results descriptions. Nonetheless, the complexity of social interactions including health status indicators, with also the need to take into account the spatial and temporal realm of the survey, may incline to look at variable associations in a multiway approach instead of a two-way matrix analysis. This means for example, that interaction of order three between the spatial configuration (say the Output Areas of an urban zone), the set of categorical variables (say selected from a census survey) and the evolution (say every 5 years over a 30 years period) would be considered in order to differentiate spatio-temporal associations across categorical variables. For census-based spatial simulation models such as microsimulations, exhibiting this kind of properties is useful as forecasts moves of population characteristics to be considered for healthcare policy scenario analysis. In this paper it is shown how to run this type of analysis within R using a package dedicated to multiway analysis (the R package PTAk), that is, working on multi-entry array data using an algorithm extending classical multidimensional analysis. A didactic approach from two-way analyses to multiway ones, of the same dataset generated from a population spatial simulation model allows a critical demonstration of the potential of the different t methods. Particular attention is also given to the different choices of spatial units and the scale variation effect within a nested administrative zoning system that can be analysed by a correspondence analysis with respect to a model (extending the approach using the independence model) and which can be done for a simple, multiple of multiway correspondence analysis

    Red and processed meat consumption and purchasing behaviours and attitudes: impacts for human health, animal welfare and environmental sustainability

    Get PDF
    Objective: Higher intakes of red and processed meat are associated with poorer health outcomes and negative environmental impacts. Drawing upon a population survey the present paper investigates meat consumption behaviours, exploring perceived impacts for human health, animal welfare and the environment. Design: Structured self-completion postal survey relating to red and processed meat, capturing data on attitudes, sustainable meat purchasing behaviour, red and processed meat intake, plus sociodemographic characteristics of respondents. Setting: Urban and rural districts of Nottinghamshire, East Midlands, UK, drawn from the electoral register. Subjects: UK adults (n 842) aged 18–91 years, 497 females and 345 males, representing a 35·6 % response rate from 2500 randomly selected residents. Results: Women were significantly more likely (P 60 years) were more likely to hold positive attitudes towards animal welfare (P<0·01). Less than a fifth (18·4 %) of the sample agreed that the impact of climate change could be reduced by consuming less meat, dairy products and eggs. Positive attitudes towards animal welfare were associated with consuming less meat and a greater frequency of ‘higher welfare’ meat purchases. Conclusions: Human health and animal welfare are more common motivations to avoid red and processed meat than environmental sustainability. Policy makers, nutritionists and health professionals need to increase the public’s awareness of the environmental impact of eating red and processed meat. A first step could be to ensure that dietary guidelines integrate the nutritional, animal welfare and environmental components of sustainable diets

    Spatially clustered associations in health GIS

    Get PDF
    Overlaying maps using a desktop GIS is often the first step of a multivariate spatial analysis. The potential of this operation has increased considerably as data sources and Web services to manipulate them are becoming widely available via the Internet. Standards from the OGC enable such geospatial mashups to be seamless and user driven, involving discovery of thematic data. The user is naturally inclined to look for spatial clusters and correlation of outcomes. Using classical cluster detection scan methods to identify multivariate associations can be problematic in this context, because of a lack of control on or knowledge about background populations. For public health and epidemiological mapping, this limiting factor can be critical but often the focus is on spatial identification of risk factors associated with health or clinical status. Spatial entropy index HSu for the ScankOO analysis of the hypothetical dataset using a vicinity which is fixed by the number of points without distinction between their labels. (The size of the labels is proportional to the inverse of the index) In this article we point out that this association itself can ensure some control on underlying populations, and develop an exploratory scan statistic framework for multivariate associations. Inference using statistical map methodologies can be used to test the clustered associations. The approach is illustrated with a hypothetical data example and an epidemiological study on community MRSA. Scenarios of potential use for online mashups are introduced but full implementation is left for further research

    Rapid flood inundation mapping using social media, remote sensing and topographic data

    Get PDF
    Flood events cause substantial damage to urban and rural areas. Monitoring water extent during large-scale flooding is crucial in order to identify the area affected and to evaluate damage. During such events, spatial assessments of floodwater may be derived from satellite or airborne sensing platforms. Meanwhile, an increasing availability of smartphones is leading to documentation of flood events directly by individuals, with information shared in real-time using social media. Topographic data, which can be used to determine where floodwater can accumulate, are now often available from national mapping or governmental repositories. In this work, we present and evaluate a method for rapidly estimating flood inundation extent based on a model that fuses remote sensing, social media and topographic data sources. Using geotagged photographs sourced from social media, optical remote sensing and high-resolution terrain mapping, we develop a Bayesian statistical model to estimate the probability of flood inundation through weights-of-evidence analysis. Our experiments were conducted using data collected during the 2014 UK flood event and focus on the Oxford city and surrounding areas. Using the proposed technique, predictions of inundation were evaluated against ground-truth flood extent. The results report on the quantitative accuracy of the multisource mapping process, which obtained area under receiver operating curve values of 0.95 and 0.93 for model fitting and testing, respectively
    corecore